Speech Recognition Using Time Domain Features from Phase Space Reconstructions

نویسندگان

  • Jinjin Ye
  • Michael Johnson
چکیده

A speech recognition system implements the task of automatically transcribing speech into text. As computer power has advanced and sophisticated tools have become available, there has been significant progress in this field. But a huge gap still exists between the performance of the Automatic Speech Recognition (ASR) systems and human listeners. In this thesis, a novel signal analysis technique using Reconstructed Phase Spaces (RPS) is presented for speech recognition. The most widely used techniques for acoustic modeling are currently derived from frequency domain feature extraction. The reconstructed phase space modeling technique taken from dynamical systems methods addresses the acoustic modeling problem in the time domain instead. Such a method has the potential of capturing nonlinear information usually ignored by the traditional linear human speech production model. The features from this time domain approach can be used for speech recognition when combined with statistical modeling techniques such as Hidden Markov Models (HMM) and Gaussian Mixture Models (GMM). Issues associated with this RPS approach are discussed, and experiments are done using the TIMIT database. Most of this work focuses on isolated phoneme classification, with some extended work presented on continuous speech recognition. The direct statistical modeling of RPS can be used for the isolated phoneme recognition. The Singular Value Decomposition (SVD) is used to extract frame-based features from RPS, and can be applied to both isolated phoneme recognition and continuous speech recognition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگی‌های استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز

The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...

متن کامل

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

Joint Frequency Domain and Reconstructured Phase Space Derived Features for Speech Recognition

A novel method for speech recognition is presented, utilizing nonlinear/chaotic signal processing techniques to extract timedomain based, reconstructed phase space derived features. By exploiting the theoretical results derived in nonlinear dynamics, a distinct signal processing space called a reconstructed phase space can be generated where salient features (the natural distribution and trajec...

متن کامل

Speech recognition using reconstructed phase space features

This paper presents a novel method for speech recognition by utilizing nonlinear/chaotic signal processing techniques to extract time-domain based phase space features. By exploiting the theoretical results derived in nonlinear dynamics, a processing space called a reconstructed phase space can be generated where a salient model (the natural distribution of the attractor) can be extracted for s...

متن کامل

Classification of emotional speech using spectral pattern features

Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004